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Abstract 



We present a new method for proving lower bounds on the expected running time of evolutionary 
\ ' algorithms. It is based on fitness-level partitions and an additional condition on transition probabilities 

between fitness levels. The method is versatile, intuitive, elegant, and very powerful. It yields exact or 
near-exact lower bounds for LO, OneMax, long fc-paths, and all functions with a unique optimum. Most 
lower bounds are very general: they hold for all evolutionary algorithms that only use bit-flip mutation 
as variation operator — i. e. for all selection operators and population models. The lower bounds are 
PQ ' stated with their dependence on the mutation rate. 

These results have very strong implications. They allow to determine the optimal mutation-based al- 
gorithm for LO and OneMax, i. e., which algorithm minimizes the expected number of fitness evaluations. 
This includes the choice of the optimal mutation rate. 



1 Introduction 

> 



Evolutionary algorithms (EAs) and other randomized search heuristics have been successfully applied to 
jy^ countless difficult practical problems. One important reason for their popularity and their success is that 

they can be applied to a broad range of problems. They are usually easy to implement and they typically 
produce reasonable results in short time, with little effort. 

However, getting the best possible results requires much greater effort. When aiming for maximum 
efficiency, one has to think carefully about what search algorithm to use, how to make design choices, and 
how to tune the parameters of the algorithm. In the search for the best strategy, researchers and practitioners 
alike are faced with a range of fundamental questions: 

•rH , 

■ • How effective is search algorithm A on problem/problem class P? 

h : 

• What is the best parameter setting for A on PI 

• Is search algorithm B faster than A on PI 

• What is the best search algorithm for PI 

Finding answers to these questions is now more pressing then ever. The field of evolutionary computation 
has grown immensely in the last decades and it has led to the development of countless variants of search 
algorithms, with new bio-inspired optimization paradigms emerging every year. This can prove to be a 
burden as practitioners are faced with an overwhelming variety of search algorithms. 

Running time analysis has emerged as an important and very active area in evolutionary computation. 
The goal is to formally analyze the random or expected time until an evolutionary algorithm has found 



*A preliminary version with parts of the results has been presented at a conference I41| . The results therein were limited to 
mutation rate 1/n. 
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a satisfactory solution for a given problem. By assessing how the expected running time grows with the 
problem dimension, we can gain valuable insights into their scalability. These insights apply to arbitrary, 
not too small problem dimensions — even to very large dimensions that are beyond the capabilities of today's 
hardware. 

It also yields a solid foundation for the comparison of different EAs or different heuristic paradigms. 
This includes the question in how far design choices affect performance such as the choice of representations, 
operators, and parameters. In some cases running time analyses allow to draw conclusions about optimal 
parameter settings. Some of the above questions can be answered. Last but not least, theoretical analyses 
lead to insight into the working principles of EAs and to a better understanding of their behavior. 

Running time analyses have been performed for classes of pseudo-Boolean functions such as unimodal 
functions [T5] , linear functions [151 ITS! l2"0l ITU1 I5T] , functions with plateaus [33] , monotone polynomials 48 , 
and monotone functions [8]. The same approach has been used for the analysis of problems from combi- 
natorial optimization, see the survey by Oliveto, He, and Yao [34] or the recent text book by Neumann 
and Witt 33 . Also many other metaheuristics have been studied such as memetic algorithms [101 EH US] , 
estimation-of-distribution algorithms [131 3] , ant colony optimization [T71 |35J 1301 E3] j particle swarm opti- 
mization [45, 5H|, and artificial immune systems 52, 23 . A good summary of recent developments is given 
in the edited book by Auger and Doerr [T]. 

However, running time analysis comes with several drawbacks. In many cases running time analyses 
are very challenging. Search heuristics represent complex dynamic systems that are often hard to handle 
analytically. Hence, studies have often been limited to very specific settings. Comparisons between different 
search algorithms — or variants of the same algorithm — have often been performed on contrived artificial 
functions that were designed specifically to enable an analysis. 

Furthermore, many analyses are restricted to a single, very specific algorithm such as the (1+1) EA with 
mutation probability 1/n. This helps to keep the analyses simple, but it also means that conclusions are 
limited to this particular algorithm. Another shortcoming is that, when considering polynomial expected 
running times, often only upper bounds on the expected running time are shown. Upper bounds are more 
appealing than lower bounds as they show that a particular search algorithm is effective on a particular 
problem. Lower bounds are typically harder to prove and often more imprecise, compared to upper bounds. 
For example, for the function OneMax an upper bound with the exact leading constant (i.e., the constant 
factor preceding the fastest-growing term) is known from the 1990s (see Rudolph [38, page 95]). But a 
matching lower bound with the same leading constant was only proved recently, in 2010, by Doerr, Fouz, 
and Witt [5] . 

When only upper bounds are available it is hard to make comparisons between different algorithms. Even 
when an upper bound for search algorithm A is much lower than an upper bound for B, we cannot conclude 
with rigor that A is more efficient than B. It could be that the analysis for A is more precise than that for B, 
but in fact B is more efficient than A. One has to take care not to draw wrong conclusions when interpreting 
running time bounds. Only if we have a lower bound for B that is larger than the upper bound for A we 
can say with certainty that A is more effective than B. This stresses the importance of lower bounds, and 
that of having precise running time bounds. 

Many researchers have tried to develop methods for proving lower bounds. Drift analysis has emerged 
as one powerful tool [3S1 [TH1 [HI HH1 0- However, it is not always easy to apply. We present a new method 
for proving lower bounds on the running time of stochastic search algorithms (see Section [3]). It follows the 
idea of fitness-based partitions or fitness levels, a well-known tool for proving upper running time bounds. 
The idea is to partition the search space into a sequence of sets called fitness levels. These sets have to be 
traversed in order to find a global optimum. Lower bounds can be derived if we have upper bounds on the 
probability of reaching a better fitness level and additional information about the transition probabilities 
between fitness levels. 

The method is illustrated with applications to well-studied test problems. The function OneMax(x) :— 
S™=i Xi coun ts the number of ones in the bit string. The optimum is the all-ones bit string. Assessing 
the performance of a search algorithm on OneMax equals the question how effective the algorithm is at hill 
climbing — and at finding a particular target point if best possible hints are given. The function LeadingOnes, 
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shortly LO(x) := Y^7=i IIj=i Xi > * s an °ther popular test function that counts the number of leading ones in 
the bit string. All bits have to be optimized sequentially. This gives an example of a unimodal function that 
is more difficult than OneMax. It also resembles worst-case inputs for shortest path problems [33]. Long 
A:-paths [H] |3§] H51 ED] represent even more difficult unimodal functions where EAs typically climb up a 
path. As the path can have exponential length and shortcuts are unlikely, this a very challenging problem. 
For details we refer to Section [5J 

The example applications show that the new method is applicable to a wide range of problems and to 
a very broad class of evolutionary algorithms. We introduce the term mutation-based EAs for a class of 
EAs that first generate initial search points uniformly at random, and afterwards only use common bit-flip 
mutation operators for variation. This class contains all common EAs that do not use crossover, e.g., all 
(/i + A) EAs, all (/i, A) EAs as well as parallel variants such as island models. Basically, the class contains 
all EAs regardless of the selection operators and population models (see Section [2]). 

The resulting lower bounds apply to all mutation-based EAs. They are not only tight in an asymptotic 
sense. They contain best possible leading constants when compared to upper bounds for the best EAs in 
this class, up to lower-order terms. The bounds also show how the expected running time depends on the 
mutation rate. This highlights the impact of this parameter on performance and it allows for conclusions 
on the optimal mutation rate. Along the way, we also present a refinement of the fitness-level method for 
proving upper bounds in Section [4] 

The results allow to make conclusions about optimal EAs, where optimality is regarded as minimizing the 
expected number of function evaluations. A summary of the results derived from applying the new method 
is as follows. (1+1) EA M denotes a variant of the (1+1) EA initialized with a best out of fj, individuals 
generated uniformly at random. 

• For LO we get a lower bound for all mutation-based EAs, see Section [5] This bound equals a refined 
upper bound for the (1+1) EA M . For all /i we get an exact formula for the expected running time of 
the (1+1) EAp, including the (1+1) EA. Together with the independent work by Bottchcr, Docrr, and 
Neumann [3J, this is the first time that an exact formula for an expected running time of an EA can 
be determined. Following [3J, the optimal mutation rate can be computed as p sa 1.59/n. The optimal 
mutation-based EA turns out to be the (1+1) EA^ for some value p, > 1. 

• For OneMax we also get a lower bound for all mutation-based EAs, see Section [51 For all reasonable 
mutation rates the lower bound matches an upper bound for the (1+1) EA using the same mutation 
rate, up to terms of smaller order. The optimal mutation rate turns out to be p = 1/n (see also 
Witt [51]). The optimal mutation-based EA is again the (1+1) EA^ for a proper p > 1. 

• The above lower bound on OneMax generalizes to the very large class of functions that have a unique 
optimum, see Section [7] This is based on the structural insight that for all mutation-based EAs finding 
a single target point for any problem is never easier than optimizing OneMax. 

• For long fc-paths we get upper and lower bounds that match up to smaller order terms, for all reasonable 
mutation rates, when considering the (1+1) EA starting on the first point on the path, see Section [8] 
Like for OneMax, p = 1/n is the optimal mutation rate. 

In addition to these remarkably powerful results, the method is easy to describe and it has a simple, direct 
proof. As such, it is well suited for teaching purposes and it shows that precise lower bounds can be obtained 
without using drift analysis. 

1.1 Previous and Related Work 

There is a long history of results on pseudo-Boolean optimization. We review results on lower bounds and 
also describe work that preceded, relates to, or has followed from this work |41) . 

Already Droste, Jansen, and Wegener [T5] presented a lower bound of fi(nlogn) for the (1+1) EA on 
every n-bit pseudo-Boolean function with unique global optimum. The constant factor preceding the n log n- 
term is 1/2 • (1 - e" 1 / 2 ) « 0.196. We gener [47] mentions a lower bound (1 — e) ■ n In n — cn where e > is an 
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arbitrarily small constant and the constant c > depends on e. Doerr, Fouz, and Witt [3] presented a lower 
bound (1 — o(l))enlnn for the (1+1) EA on OneMax. The last result was extended by Doerr, Johannsen, 
and Winzen [10] . They proved that the same bound holds for the (1+1) EA on every function with a unique 
global optimum. 

Later on, Doerr, Fouz, and Witt [7J were inspired by the lower bound en Inn — 2 log log n — 16n for 
mutation-based EAs with p — 1/n on OneMax in the preliminary work |41j . Their goal was to remove the 
2 log log n-term in order to arrive at an even more precise bound for (1+1) EA. They managed to get a lower 
bound of en In n — 0(n) and along the way they introduced two new techniques to the analysis of randomized 
search heuristics: lower bounds with variable drift and probability-generating functions. 

Witt [ST] followed up on this work [UJ and presented lower bounds for the class of mutation-based EAs 
on linear functions. He proved that the mutation rate p = 1/n is an optimal choice for the (1+1) EA 
on linear functions. He also generalized a structural result from [41] in the following sense. The original 
statement is that the expected optimization time of any mutation-based EA with mutation probability 1/n 
on any function with unique global optimum is at least as large as the expected optimization time of the 
(1+1) EA with mutation probability 1/n on OneMax. Witt generalized this towards arbitrary mutation 
probabilities < p < 1/2 and stochastic dominance. We will discuss and apply this result in Section [7] 

The LeadingOnes function has been equally popular. Droste, Jansen, and Wegener [T5] showed that the 
running time of the (1+1) EA on LO is at least cin 2 with probability 1 — 2~ n ^ n \ for some constant C\ > 0. 
Bottchcr, Doerr, and Neumann presented an exact formula for the expected running time of the (1+1) EA 
on LO at the same conference [3] . While the preliminary version of this work [UJ considered mutation rates 
of p = 1/n only, the authors considered general mutation rates p. Their results were limited to the (1+1) EA 
as opposed to all mutation-based EAs. They showed that the optimal fixed mutation rate for LO is not 
p = 1/n, but a slightly higher value of p ~ 1.59/n. In addition, they presented a simple adaptive scheme for 
choosing the mutation rate and showed that this leads to even smaller numbers of function evaluations. 

These findings show that the often recommended choice p = 1/n is not always optimal. Another reason 
why the choice of the mutation probability is far from settled is that even on a seemingly easy class of 
functions a constant factor in the mutation rate can change a polynomial expected running time into an 
exponential one [8j. 

Black-box complexity of search algorithms as introduced by Droste, Jansen, and Wegener [16] is another 
method for proving lower bounds. These bounds hold for all algorithms in a black-box setting where only 
the class of functions to be optimized is known, but the precise instance is hidden from the algorithm. Their 
results imply that every black-box algorithm needs at least J7(n/logn) function evaluations to optimize 
OneMax and LO (or, to be more precise, straightforward generalizations to function classes). Recently 
Lehre and Witt [29] presented a more restricted black-box model. If only unary operators are used (that 
is, operators taking a single search point as input, such as mutation) and all operators are unbiased with 
respect to bit values and bit positions, every black-box algorithm needs fi(nlogn) function evaluations for 
every function with a unique global optimum. The constant factor hidden in the is not specified; it is 
known to be at most 1. This line of research has been extended subsequently to more general conditions for 
unbiasedness [37 , higher-arity operators [9] and more restricted black-box models [12] . 

Investigating conditions for the optimality of search algorithms, Borisovsky and Eremeev [5] introduced 
the concept of dominance for the performance comparison of evolutionary algorithms. For sorting problems 
and the function OneMax they give sufficient conditions on when the (1+1) EA is faster than evolutionary 
algorithms with other reproduction operators. 

Recently, drift analysis has received a lot of attention [35] [TH] [TTJ |28j [JJ . Assume a non- negative potential 
function such that the optimum is reached only if the potential is 0. If the expected decrease ("drift") of 
the potential in one generation is bounded from below, an upper bound on the expected optimization time 
follows. Conversely, an upper bound on the drift implies lower bounds on the expected optimization time. If 
there is a drift pointing away from the optimum on a part of the potential's domain then exponential lower 
bounds can be shown [35] [35] . 
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2 Preliminaries 



The presentation in this work is for maximization problems, but it can be easily adapted for minimization. 
For the usage of asymptotic notation we refer to text books such as Cormen, Leiserson, Rivest, and Stein [5]. 

2.1 Mutation-Based Evolutionary Algorithms 

The technique for proving lower bounds will be applied to a very general class of evolutionary algorithms. 
It contains all EAs that generate y, G IN individuals uniformly at random and afterwards only use standard 
mutations to generate offspring (see Algorithm [1]) . 

Mutation is done by flipping each bit independently with some given mutation probability < p < 1/2. 
The most extreme value p = 1/2 corresponds to choosing an offspring uniformly at random, i.e., random 
search. We do not consider mutation rates p > 1/2 as this choice would favor offspring far away from the 
parent, thus contradicting the purpose of mutation. 



Algorithm 1 Scheme of a mutation-based EA 
l: create n individuals x\, . . . ,Xfj, € {0, 1}" uniformly at random. 
2: let t := p. 

3: loop 

4: select a parent x € {xi, . . . , x t } according to t and f{x{), . . . , f{x t ). 

5: create x t +i by copying x and flipping each bit independently with probability p. 

6: \ett:=t + l, 

7: end loop 



The optimization time is given by the time index t that counts the number of function evaluations. It 
is defined as the time index t when a global optimum is found first. In a more general sense, we can also 
regard the (expected) hitting time of a set of desirable search points. For some lower bounds and for small 
values of fj,, we pessimistically disregard the effort for creating the p search points. 

The parent selection mechanism is very general as any mechanism based on the time index t and fitness 
values of previous search points may be used. Any mechanism for managing a population fits in this 
framework. This includes parent populations and offspring populations with arbitrary selection strategies 
and even parallel evolutionary algorithms with spatial structures and migration such as the island model 26 . 

The (1+1) EA is a well-known special case with population size fj, = 1. It maintains a single individual 
x and in every iteration it creates x' by mutating x and replacing x by x' if f(x') > f(x). We denote by 
(1+1) EA M a generalization of the (1+1) EA that is initialized with a best individual out of fi individuals 
which are generated uniformly at random. 

Before introducing the new lower-bound method we elaborate on the range of sensible values for the 
mutation rate p. The expected number of flipping bits equals pn. This is 1 for the standard choice p = 1/n. 
If p <C 1/n then the expected number of flipping bits is close to 0. The expected time until mutation 
creates any offspring that is different from its parent is then at most 1 / {pn) as pn is an upper bound on 
the probability that any bit flips. This means that p must be at least an inverse polynomial to allow for 
polynomial expected running times (unless the initialization finds a global optimum with high probability). 

If the problem only contains a single optimum that has to be hit, p cannot be too large. If p < 1/2 
then the best probability for hitting the optimum from a non-optimal parent is obtained when the parent 
has Hamming distance 1 to the optimum. Then the probability is p(l — p) 71 ^ 1 < (1 — p) n < e~~ pn and the 
expected waiting time until this happens is e pn . We summarize these findings in the following theorem, 
showing that unreasonable parameter settings lead to unreasonable running times. Note that the optimum 
is not found during initialization with population size u with probability at least 1 — /i ■ 2~ n . 

Theorem 1. Let f be a function with a unique global optimum. The expected optimization time of every 
mutation-based EA on f with mutation probability < p < 1/2 is at least (1 — ji ■ 2~ n ) • l/(j>n) and at least 
(1 - n-2~ n ) -e pn . 



■5 



In particular, for every fj, the expected optimization time is superpolynomial if p < n 
w(logn)/n and exponential (i. e. 2 n " for some constant e > 0) if p < 2~" ( ' or p = n^ 1 ) -1 . 
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The result can be extended towards functions with multiple global optima, but the above result suffices 
for our purposes. 

2.2 The Fitness-Level Method for Proving Upper Bounds 

We review the fitness-level method, also known as the method of f -based partitions [47] . It yields upper 
bounds for EAs whose best fitness value in the population never decreases. We call these algorithms elitist 



The idea is as follows. We partition the search space into sets that are strictly ordered with respect to the 
fitness of the contained individuals. Every search point in a higher fitness-level set has a strictly higher fitness 
than any search point in a lower fitness-level set. We say that an elitist algorithm is on a particular level if 
the best search point created to far is in the respective fitness-level set. Due to the elitism, the algorithm can 
only increase its current fitness level. If we have a lower bound on the probability of increasing the current 
level, the reciprocal is an upper bound on the expected time until a particular fitness level is left. As each 
level is left for good, the sum of all these times — starting from the initial level — yields an upper bound on 
the expected optimization time. 

Theorem 2 (Fitness-level method for proving upper bounds). For two sets A,BC {0,1}™ and fitness 
function f let A <f B if f(a) < f(b) for all a £ A and all b € B. Consider a partition of the search space 
into non-empty sets A\, . . . , A rn such that A\ </ A2 </ • • • </ A m and A m only contains global optima. 
For a mutation-based FA A we say that A is in Ai or on level i if the best individual created so far is in A4. 
Consider some elitist EA A and let Si be a lower bound on the probability of creating a new offspring in 
Ai+i U • • • U A m , provided A is in Ai. Then the expected optimization time of A on f (without the cost of 
initialization) is bounded by 



The second bound results from pessimistically assuming that the algorithm is always initialized in A\. 

Let us illustrate the method with two examples for the (1+1) EA with mutation probability p — 1/n. We 
define the canonical partition as the partition in which Ai contains exactly all search points with fitness i. 
For LO the method applied to the canonical partition yields an upper bound of ^2™ = q en = en 2 since the 
probability of finding an improvement is lower bounded by the probability of flipping the first bit with 
value 0. This probability is at least 1/n ■ (1 — l/n)™ _1 > l/(en). For OneMax we get an upper bound of 
y^!_ en/(n — 1) = en^™ =1 1/i < en Inn + 0(n) for the (1+1) EA since on level i there are n — i 1-bit 
mutations that flip a 0-bit to 1 and hence improve the fitness. 

In order to make an effort towards a unified theory of search heuristics, we also present the following 
extension. After finding an improvement, stochastic search algorithms often need some time to adapt their 
underlying probabilistic models. For instance, the algorithm (/i+1) EA investigated by Witt [49] needs some 
time until the population contains "enough" individuals on a new fitness level, so that an improvement 
can be found with a good probability. The ant colony optimization algorithms investigated in Gutjahr and 
Sebastiani |17j as well as in Neumann, Sudholt, and Witt |30j need some time to adapt their pheromones 
towards a new best solution. A similar argument holds for velocities in a binary particle swarm optimization 
algorithm investigated by Sudholt and Witt [44] . 

In all these studies, it is pessimistically disregarded that an improvement might be found while waiting 
for the algorithm to adapt. Fix a notion of adaptation and let Ti be the (random) time until an algorithm 
has adapted, after a new fitness level i has been found. Redefining pi to the worst-case probability of finding 
an improvement in one iteration after adaptation, the expected optimization time can be bounded by 



EAs. 




(1) 




G 



In addition, Lehre [57] recently presented an extension towards non-elitist populations, with applications 
to comma strategies and various selection operators. Roughly speaking, he proves that if 

• the probability of generating an offspring on a worse fitness level is not too large, 

• selection has a strong enough tendency to pick high- fitness individuals, and 

• the population is large enough 

then an upper bound similar to the one in Theorem [2] applies. The running time bound is asymptotic, not 
revealing a precise constant factor, though. But his work shows that the method is applicable in a much 
more general context. 

3 Lower Bounds with Fitness Levels 

We now show that fitness-level arguments can also be applied to show tight lower bounds on the running time. 
Researchers have attempted to make this step earlier. The best lower bounds with fitness-level arguments 
known so far were presented by Wegener in [47] . 

Lemma 1 (Wegener [47]). Let A\ </ • • ■ </ A m be a fitness-level partition for some fitness function f. Let 
Ui be an upper bound on the probability of an EA A creating a new offspring in Ai+x U • • • U A m , provided A 
is in Ai (where "A is in Ai" is defined as in Theorem^). Then the expected optimization time of A on f is 
at least 

m—l 

P(A starts in AA — . 

i—i 

The resulting lower bounds are very weak since we only look at the time it takes to leave the initial fitness 
level and then pessimistically assume that the optimum is found. 

For instance, for the (1+1) EA with mutation probability p = 1/non OneMax Lemma [1] yields the lower 
bound 

n 

as the initialization is very likely to create a search point with around n/2 1-bits. For the (1+1) EA on LO 
we get the lower bound 

n— 1 1 

£2-*- 1 - — = (l-2-™)-n, 

z=0 1 

which is again very crude; the real expected running time is of order 0(n 2 ). 

Much better lower bounds can be achieved by making an additional assumption about the transition 
probabilities between fitness levels. The idea is as follows. If we know that a search algorithm typically does 
not skip too many fitness levels, it is likely that many fitness levels need to be traversed. This yields a lower 
bound that is proportional to the upper bound from Theorem [2] 

In the following theorem j^j can be regarded as the conditional probability of jumping from level i to 
level j, given that the algorithm leaves level i. 

Theorem 3. Consider an algorithm A and a partition of the search space into non-empty sets A\, . . . , A m . 
For a mutation-based EA A we again say that A is in Ai or on level i if the best individual created so 
far is in Ai. Let the probability of A traversing from level i to level j in one step be at most u, • jij and 
y~lj—i +1 7i,j = 1- Assume that for all j > i and some < x < 1 it holds 

m 
k=j 
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Then the expected hitting time of A m is at least 

m— 1 / ^ m — 1 _^ \ 

V P(A starts in AA ■ \ hx V - (3) 

i=i y .7=i+i 3 y 

m — 1 m — 1 

> ^ P(A starts in At) ■ x E — ■ ( 4 ) 



The variable x was coined viscosity by Jon Rowe [36j . Similar to the viscosity of a liquid, it resembles the 
viscosity of the fitness- level partition on a scale between and 1. A low viscosity means that we can have 
situations where a search algorithm skips many fitness levels and only few levels are actually encountered. 
A high viscosity means that a search algorithm typically encounters many fitness levels as large jumps to 
higher fitness levels are unlikely. 

For x > the reciprocal, 1/x, is an upper bound on the expected number of fitness levels gained during 
an improvement. To see this, note that condition ([2J) implies 53fc=j 7a < (1 — x) ' Y2k=j-i 7*- fe ^ or a ^ 3 ^ 
This implies YZ=i 7a < i 1 ~ x)^ 4 " 1 H,k = (1 - x)^ 1 - Using E (X) = J2Zo P (X > x) ii X 

takes only non-negative integer values, the expected progress in terms of fitness levels, assuming current 
level i, is at most 

mm m m-i-1 -, 

E E7a< T,a-xr^= E d-xy = 1 " (1 " x) • 

j=i+l k=j 3 = i+l j=0 X 

This yields 1/x as an upper bound that is independent from the current level. 

In a case of extreme viscosity, i.e., x — 1 condition ([2j can only hold if 7i,;+i = 1 and 7^/. = for all 
1 < i < m — 1 and all 2 < k < m — i. This means that the algorithm detcrministically reaches the next 
fitness level when an improvement is made. It passes through all fitness levels between the initial one and 
the optimal one. These are the strongest possible conditions on the transition probabilities. 

Contrarily, if x — we have no viscosity at all. Condition @ is trivially satisfied for all choices of 
the 7-variables. This is the weakest possible setting and it leaves open the possibility that the optimum is 
reached by a direct jump, when the current fitness level is left. In fact, the resulting bound ([3]) equals the 
one from Lemma [TJ 

The most interesting settings are those where the viscosity is between and 1. For instance, if % = 1/2 
then condition @ is roughly equivalent to the 7-variables decreasing exponentially with base 2: 7^+/,- < 2~ fe . 
Larger viscosities require a steeper decay, while smaller viscosities allow for a less steep decay. For selected 
fitness levels on OneMax it turns out that the transition probabilities decay rapidly, allowing to choose x as 
high as 1 — o(l). This means that only a vanishing fraction of fitness levels is skipped — in expectation — and 
it leads to a very tight lower bound. 

Before we get to the proof of Theorem [31 we state the following conclusions about how tight the upper 
and lower bounds with fitness levels can be. 

Corollary 1. Let A\, . . . , A m , Xi s ii u i> an d Ti,j f or 1 ^ hi < m be defined as in Theorems^ and\3\ Let all 
conditions in these theorems hold. 

1. If Si — m for all i then the lower bound (j4]) matches the upper bound ^ up to a factor of x- 

2. If x > is a constant and there is a constant c > 1 such that Ui < c • u, for 1 < i < m — 1 then (U) 
and (fTJ) are asymptotically equal. 

3. Ifx—1 — o(l) and Ui < (1 + o(l)) • s, for 1 < i < m — 1 then (j4j and (JT]) are equal up to lower-order 
terms. 

A fitness-level partition that obeys the second case was called (asymptotically) tight f -based partition 
in [261. 
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We proceed by proving Theorem [3] Afterwards, we give advice on how to apply it, including example 
applications in the following sections. 

Proof of Theorem fJJ The second bound immediately follows from the first one since < x < 1 . Let Ei be 
the minimum expected remaining optimization time, where the minimum is taken for all possible histories 
X\ , . . . , Xt of previous search points with x% , . . . , Xt G A\ U • • • U Ai . By definition E\ > Ei > • • • > E m = as 
the conditions on the histories are subsequently relaxed. By the law of total expectation the unconditional 
expected optimization time is at least Y^iLi P starts in Ai) ■ Ei, hence we only need to bound Ei. 

After one step, for each i < k the algorithm is in Ek with probability at most UiTi & and it remains in i 
with probability 1 — XXli+i u ili,k =1 — Ui. This establishes the recurrence 

m— 1 

Ei > 1 + U ili,k ■ E k + (1 - Ui) ■ Ei. 

k=i+l 

Subtracting (1 — ui)Ei on both sides and dividing by Ui yields 



_^ m—x 

Assume for an induction that for all k > i it holds Eu > + v y^ m T 1 L1 — . Then we get 

^ m— 1 / 1 m-1 | \ 

Note that 

m — 1 m — 1 ^ m— 1 _^ J — 1 

fe=i+l 3=fe+l J 3=«+l J k=i+l 

since on the left-hand side every term 1/uj appears for all summands k = i + 2, . . . , j — 1 in the outer sum, 
each summand weighted by % t kX- Together, we get 




□ 

One crucial asset of the theorem is that in order to apply it, we do not need to know the transition 
probabilities exactly. It suffices to state upper bounds on the transition probabilities. More precisely, we 
require that Ui ■ jij is an upper bound on the probability of jumping from level i to level j. We have the 
freedom to choose Ui and 7^ as long as the 7- variables sum up to 1 and they fulfil ©. 
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In cases where the transition probabilities are not known precisely or where it is not possible or feasible 
to derive an analytical expression, we can use different 7-variables as substitutes. Note that the condition 
on Ui ■ jij upper bounding the real transition probability is easier to fulfil if Ui is large. So, we can choose 
Ui as large as necessary in order to prove the conditions of Theorem [3J The price for choosing a large is 
that the resulting lower bound becomes smaller as the u^s grow. 

A similar observation holds for the choice of x- As remarked before, the higher the viscosity x, the stronger 
the conditions on the transition probabilities are. The lower x, the easier it is to establish condition ([2]), and 
the smaller the lower bound becomes. 

The method is hence very versatile and flexible as we are free to choose \ an d the u-, 7-variables such 
that all conditions hold. The upcoming example applications give advice as to how these values can be 
chosen. 

Note that the theorem does not require the sets Ai to form fitness levels: we do not assume A\ </ • • • </ 
A m . The conditions on the 7-variables indirectly imply that sets with small index are "worse" than sets 
with higher index. Also note that Theorem [3] bounds the expected hitting time of set A m . This includes the 
expected optimization time as special case in which A m contains exactly all global optima. Alternatively, 
A m can contain other desirable solutions such as those with a certain minimum fitness, all local optima, all 
feasible solutions, etc. 



4 Refined Upper Bounds with Fitness Levels 

It has become clear that information about the transition probabilities is essential for proving meaningful 
lower bounds. This knowledge can also help to obtain refined upper bounds. The following result is very 
similar to Theorem [3j with some inequalities reversed. Also the proof ideas are very similar to the ones 
in Theorem [3] In contrast to the lower bound, we need to add the condition (1 — x) s j ^ s i+i f° r a ^ 
1 < j < m — 2, which states that the success probabilities must not be imbalanced. 

Theorem 4. Consider a partition of the search space into non-empty sets 
A\ <f A2 <f ■■■ </ A m such that only A m contains global optima. For an elitist mutation-based EA 
A we again say that A is in Ai or on level i if the best individual created so far is in Ai. Let the probability 
of traversing from level i to level j in one step be at least Si ■ 7^ and lid = 1- Assume that for all 

j > i and some < x < 1 it holds 

m 

7i,j<xX]^- ( 7 ) 
k=j 

Further assume (1 — x) s j ^ s j+i f or all I < j < m — 2. Then the expected hitting time of A m is at most 

(8) 



^2 p {A starts in Ai) - i h x ^2 ~ ' 

i=i \ s ' J= i+i s i J 



For maximum viscosity, i.e., x = 1? the condition (1 — x) s j ^ s j+i as we U as condition (JTJ) are always 
true. We then get the classical fitness-level method from Theorem^ The refined upper bound method from 
Theorem[4]is hence more general than the classical method from Theorem[2l Lower viscosities lead to better 
upper bounds. For instance, a constant viscosity between and 1 typically reduces the upper bound by a 
constant, compared to Theorem [2] Unlike for lower bounds, a viscosity of % = is impossible. Similar to 

i-(i-x)" 

levels in an improvement from level i. 



the lower bound, we have that — ^ — is now an upper bound for the expected number of gained fitness 



Proof of Theorem^ Let Ei be the worst-case expected remaining optimization time, given that the algo- 
rithm is in Ai. The worst case is over all histories that contain at least one search point in Ai. By the law 
of total expectation the unconditional expected optimization time is at most ^™L^ P (A starts in Ai) ■ Ei, 
hence we only need to bound Ei. 



10 



Assume for an induction that for alH < k < m — 1 it holds 

1 ™~ 1 1 
E k < — +X - :=bk > 

Sk j=k+i s ^ 

bk denoting an upper bound for Ek- The assumption holds trivially for i = m — 1. 

We now claim that < Note that the bounds are non- increasing: 6,;+i > 6^+2 > ■ • • > b m -\. The 
reason is that for all j > i we have 

- Oj+l = 1 = -q r > 

Sj Sj + i Sj+l (1 - X)Sj s j+1 

as (1 — x) s j < Sj+i by assumption. Now, if Ei < bi+i then also Ei < bi and the claim follows. We therefore 
assume Ei > fcj+i in the following, which implies Ei > bj for all j > i. Intuitively, this means that, when 
relying on Ei and the upper bounds . . . , b m — %, leaving Ai towards any Aj, j > i, is always better than 
staying in Ai. We are being pessimistic if we overestimate the probability of staying in Aj. 

This justifies the following recurrence. After one step the algorithm is in Ak with probability at least 
Si7i,fc and then the expected remaining optimization time is bounded by bk- The algorithm remains in Ai 
with probability 1 — Y^k=i+i s ili.k — 1 — &t & n d then the remaining time is again bounded by Ei. This gives 

m— 1 

Ei < 1 + Si7;,fc ■ bk + (1 - Si) • 



and rearranging yields 



Then we get 



I m— 1 

Si < — + ^ 7i,i ' bj 

Si .... 



j=i+l 




5 An Exact Formula for LeadingOnes 

Our first application of the lower-bound method is for LO as here the 7-values can be estimated in a very 
natural and precise way. 

Theorem 5. Let be a random variable that describes the maximum LO-value among /1 individuals 
created independently and uniformly at random. For every n > 2 the expected optimization time of every 
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mutation-based EA on LO using mutation probability < p < 1/2 is at least 

n—l / n—l 

yp(x^=i)-l (i-pp+i v (l-pri (9) 




■ 2 P )(i- P rj (io) 
>^{(i-p)— -ij-™, (id 

i/ie Zasi inequality holding for p > 7i~ r2( - 1 ) and p < In Inn • 1/n. 

Proof. Consider the canonical partition and assume that the algorithm is on level i < n. This implies that 
in the best individual created so far the first i + 1 bits are predetermined. In addition, in all individuals 
created so far the bits at positions i + 2, . . . , n have not contributed to the fitness yet. These bits have been 
initialized uniformly at random and they have been subjected to random mutations. It is easy to see that 
this again results in uniform random bits. More precisely, the probability that a specific bit j with j > i + 2 
in a specific individual has a specific bit value or 1 is exactly 1/2 (see the proof of Theorem 17 in Droste, 
Jansen, and Wegener [T5]). 

Consider an individual x that has been selected as parent among the created individuals. Let LO(x) = 
j < i. We bound the probability of creating an offspring with k leading ones for some i + 1 < k < n. One 
necessary condition is that the first j leading ones do not flip, which happens with probability (1 — p) J . The 
bit at position j + 1 is 0, hence it must be flipped. All bits at positions j + 2, . . . , i + 1 must obtain the 
value 1 in the offspring. This probability is determined by the number of ones among these bits. But clearly 
(1 — pY^i is a lower bound on this probability since this reflects the best-case scenario that all these bits 
are 1 in the parent. (Since p < 1/2 the probability of flipping a bit is not larger than the probability of not 
flipping it.) The last necessary condition is to create exactly k — 1 — i ones among at positions i + 2, . . . , n. 
By the preceding arguments on the "randomness" of these bits, the probability of creating exactly k — 1 — i 
ones is 2 _fc+i := 7$ j. if k < n and 2~ k+l+1 := 7^ if k = n. Putting everything together, we have that 
P (1 — pY ■ 7i.fe is an upper bound on the probability of jumping to level k. 

Checking the condition on the 7- values, Y^k=i+i = Y^k=l+i 2~ k+l + 2 _n+4+1 = 1 and for all i < j <n 
condition (|2|) holds with equality since 



n-l 

.7— fc— i— 1 _j_ cf— n— i cf — j-f-i+1 



5>*,fc - 2" fe - i " 1 + 2-"-* = 2^+ J+1 = 2 7j 



Setting x — 1/2, the preconditions for Theorem [3] are fulfilled. Using m :— p(l — p) 1 , this proves the bound 

n-l / n-l 

i=0 y j=i+l 

and hence ©. 

For the second bound, observe that the bracketed term in © can be simplified as 

-. n— 1 'I / n — l i— 1 

f 1 -*)"*+ 2 £ (i-pp = 2 (i-pr + £(i-pr-£(i-pr 3 



2 V l-(l-p)- 1 l-(l-p)-i, 

I((i-,r' + i^(i-rt-"-i^ (i-rt-) 
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1 -((i- P r n+1 -(i-2p)(i- P y 



2p 

The third bound (fTTj) follows by simple calculations and the following case distinctions. Note that due to 
the asymptotic term — O(logn) we only need to prove the bound for large n, i.e., for n > no where we can 
hx n Q e IN. 

Observe that the bound ([9]) is never larger than jj := l/(2p(l — p)"^ 1 ), even for the special case X^ = 0. 
If p, > p, then the probability that the optimum is not found during the first /2 individuals created during 
initialization is at most fx ■ 2~ n < 1/n for n large enough. This proves the claimed lower bound. 

If \i < p, then P > \og(p/p)) < p ■ 2" 1o s(^/p) = p. Pessimistically assuming that = log(/i/p) in 
case X M < \og(p/p) and estimating the conditional expected optimization time by in case X^ > \og(p/p) 
results in the following bound. 

(1-P)~ ((l-py n+1 (1 - 2p) (l- p y^M 



^ ((1 _ p) -n+l _ (1 _ p y 108CA/P) _ p(1 . 



- 2p 

We use (1 -p)~ rl+1 < (1 -p) _n < e pn < e lnln " = Inn to estimate the term -p(l - p)- n+1 . For the same 
reason \og(p/p) < log(l/(2p 2 ) • Inn) = O(logn), recalling p > n~ n ( 1 ' 1 . Assuming that n is large enough to 
make p ■ \og(p/p) < In Inn • 0((log n)/n) < 1/2, 

(1 -P)-^ < i--- = 1 + /- lQ f/_^ < 1 + O(plogn). 

1 - p ■ \og{n/p) 1-p- \og(p/p) 

Together, we get a lower bound of 

1 ft, ^-n+i i nt-i—\ - 1 fn A OQogn) 



□ 



Note that a term — 0(logn)/p is, in general, necessary since with, say, (i = nan EA will start with an 
average of 0(logn) leading ones in the best search point. As the (1+1) EA with mutation probability 0(1 /n) 
needs expected time 0(nlogn) to collect 0(logn) leading ones, the (1+1) EA M needs roughly 0(nlogn) — n 
less generations than the (1+1) EA. 

For the (1+1) EA M Uijij is the exact probability of jumping from fitness level i to level j > i. Also 
recall that all conditions ([2} on the 7jj-vah1.es hold with equality. Therefore, defining Sj := Ui and using 7,^ 
and x as m Theorem [SJ we get an upper bound for the (1+1) EA M using Theorem 2] It is easy to see that 
(1 - x)si < Si+i for all < i < n - 2 as 1/2 • p(l - p) 1 < p(l - p) l+1 is equivalent to 1/2 < 1 - p. The 
resulting upper bound equals the lower bounds (|9|) and (|10[) from Theorem [5] 

As the upper bound holds for the (1+1) EA^ but the lower bound holds for all mutation-based EAs, this 
proves that among all mutation-based EAs the (1+1) EA M is an optimal algorithm for the function LO. 

Theorem 6. The term ([9]) describes the exact expected optimization time of the (1+1) EA^ with mutation 
probability < p < 1/2 on LO. Among all mutation-based EAs with mutation probability < p < 1/2, the 
(1+1) EA^, for an appropriate choice of p, minimizes the expected number of function evaluations. 

For fj, — 1 we get the following. 

Corollary 2. The expected optimization time of the (1+1) EA with mutation probability < p < 1/2 on 
LO is exactly 

- pV + \ £ C 1 - ] = ^ ■ ((i - p)- n+1 (i - p)) ■ 
j=i+l I p 
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The second bound follows from a simple but tedious calculation. It is omitted here. For p — 1/nwe get 
that the expected running time of the (1+1) EA is 

Y ' (i 1 ~ n) ~ 1 + nj ' 

The factor preceding n 2 converges to (e — l)/2 from below. Note that we have reproduced one of the main 
results from Bottcher, Doerr, and Neumann [3J for general mutation probabilities. The latter authors derived 
the same formula and used it to compute the optimal mutation probability. They found that p sa 1.59/n 
is the optimal fixed mutation probability in that it minimizes the expected number of function evaluations. 
Our lower-bound method allows for the same conclusions to be drawn. Even stronger, while Bottcher et 
al. [3] only consider the (1+1) EA, we can make the following statement for the broad class of mutation-based 
EAs. 

Theorem 7. Among all mutation-based EAs the expected number of fitness evaluations on LO is minimized 
by the (1+1) EA^ with mutation probability p — 1.59/n and 1 < fi = 0(n\ogn). 

As shown by Bottcher, Doerr, and Neumann [3], the expected optimization time can be further decreased 
by allowing adaptive schemes for choosing the mutation probability. Theorems [5] and [7] only apply to fixed 
mutation rates. This is not due to a limitation of the lower-bound method. The method is applicable to 
their adaptive algorithm as well. We refrain from going into detail as this would overlap to a large extend 
with results already published in (3J. 



6 A Lower Bound for OneMax 

We turn to the function OneMax instead. This function is the easiest function with a unique global optimum 
and it has been studied in the context of many search heuristics [TH [551 El 123 1131 EH] • in this section we 
now derive a lower bound for the expected running time of all mutation-based EAs on OneMax. This lower 
bound will be very close to a simple upper bound for the (1+1) EA. Using the fitness-level method for upper 
bounds, the expected running time of the (1+1) EA with mutation probability p can easily be bounded as 
follows. 

Theorem 8. Let H(n) denote the n-th harmonic number. For any initial search point, the expected running 
time of the (1+1) EA with mutation probability p, Q < p < 1, is bounded from above by 

H(n) < lnrc+1 



p(l -p)™- 1 ~ p(l -p) n ' 

Proof. Define the canonical fitness levels Ai := {x | OneMax(x) = = i} for < i < n. The (1+1) EA increase 
the current fitness level i < n if only a single 0-bit flips and no 1-bit flips. This probability is at least 



Si > 



p(i - P y 



resulting in the upper bound 



1 ^ 1 H(n) 



~^ s t ~ p(l - p) 71 - 1 ^n-i p(l~ p)"- 1 ' 

The second bound follows from H(n) < (Inn) + 1. □ 

We remark that Witt |51[ Theorem 4] recently presented a similar, but more complicated upper bound. 
It applies to all linear functions and also allows for tail bounds. 
The main result in this section is the following lower bound. 



14 



Theorem 9. The expected optimization time of every mutation-based EA using mutation probability p on 
OneMax with n > 2 bits is at least 

In n — In In n — 3 



if 2 ™/ 3 < p < 1/n and at least 



ifl/n<p< 1/ (-\/n log n) . 



p(l-p)" 

ln(l/(p 2 n)) — In In n — 3 
p(l-p) n 



For the default mutation probability p = 1/n, we get the following using the common estimation 1/n • 
(1-1/n)™ < l/(en). 

Corollary 3. TTie expected optimization time of every mutation-based EA using the default mutation prob- 
ability p — 1/n on OneMax is at least 

en In n — en In In n — Sen. 

Note that for mutation probabilities p — a/n for some polylogarithmic term a = polylog(n) (defined as 
(9(log fe n), k > an arbitrary constant), the term m(l/(j? 2 n)) in the second bound of Theorem [9] simplifies to 
m(n/a 2 ) = Inn — 21n(a) = Inn — o(lnn). Hence, for mutation probabilities up to polylog(n)/n, Theorem[9] 
gives lower bounds that match the simple upper bound from Theorem [H] up to lower-order terms. 

An immediate conclusion from this result is that for the mentioned mutation probabilities the expected 
running time of the (1+1) EA is dominated by the term ^jz^prn-- (Recall that for all mutation probabilities 
not covered by Theorem [9] the expected running time is exponential by Theorem [T]) As p(l — p)™ _1 is 
maximized by the choice p := 1/n, the expected running time is minimized for this value, assuming that n 
is large enough. This establishes p = 1/n as the optimal mutation rate for the (1+1) EA on OneMax. 

This finding has recently been derived independently by Witt |51j . His result holds for all linear functions. 
The proof uses sophisticated drift analysis techniques. In this light it is surprising that the same statement 
(for OneMax) can be derived by simple fitness level arguments. This further demonstrates the strength of 
the new lower bound method. 

In order to show Theorem [31 we first show the following upper bounds on transition probabilities by 
mutation on OneMax. The lemma may be of independent interest. 

Lemma 2. Let pi.i+k denote the probability that mutating a search point with i 1-bits using mutation prob- 
ability p results in an offspring with i + k 1-bits. For every k £ Mo we have 



If, additionally, ^"Z^ < 1 and i > 2n/3 then for every < i' < i 

^ kn \n-k i n ~i) k f, , 3 i(n-i)p- 



The last statement means that, under the stated conditions, starting from a search point with a smaller 
number i' < i of 1-bits does not give a better guarantee on the probability of jumping to level i + k. This 
statement always holds for mutation probability p = 1/n, even without the mentioned conditions. However, 
for larger mutation probabilities this is non-trivial. There are examples where, under conditions different to 
the ones in Lemma [2] Pi>,i+k > Pi.i+k for i' < i. 
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Proof of Lemma [H An offspring with i + k 1-bits is created if and only if there is an integer j E Mo such 
that j 1-bits flip and k + j 0-bits flip. Using (k + j)l > kl(j + 1)! for all k E IN, j E M , 



Pi,i+k — 2J 
3=0 



j) [k + j 



P k+2J (1-P) 



n—k—2j 



p k {i-p) n - k -Y. 

j=0 



i\ in — 1 



■2:i 



<p k {i~p) n ~ k 



fejl {k + j)\ \l-p 

(n — i) k ^ V (n — i) J 

k\ ^Ti'O' + i)! V i -i> 



p 



2j 



(n — i) k / i(n — i)p 



E 

i=o 



„2\ 3 



1 



{i-P? J iKi + i)! 



The second bound for i' = i follows from 

'i(n — i)p 2 ^ 3 



E 

3=0 



1 < 1 + ^ ^ n ~ 



(1-P) 2 / + " ^ (1-p) 2 i!(j + l) 



1 < 1 i(n~i)p 2 3 

! " (1-p) 2 ' 5' 



For i' < i let d := i — i' . Note that i > 2n/3 implies 

p 2 (n~i) 2 1 p 2 i(n — i) 1 
(1-P) 2 -2' (1-p) 2 -2' 

hence p(n — i)/(l — p) < 1 A/2. Along with the first statement, we have 

-d-k {n-i) d+k ^ f(i~d){n-i + d)p 2 \ j 1 



Pl -d. l+k <p d+k (l-p) r 



(d + k)l 



E 

.7=0 



(l-p) s 



<p fc (l- P ) 



I — k 



< P k (i- P y 



< P k {i-p) 



n—k 



<P k (i- P ) 



< P k (i- P y 



n—k 



(n 


-i) fe 




fe! 


(n 


-if 




k\ 


(n 


-if 




k\ 


(n 


-if 




k\ 


(" 


-i) k 


fe! 



(n — i) d / (i — d) (n — i + d)p 



(l-p) d (d + l)\ 



E 

3=0 



2\ 3 



(i-pY 



1 



2- d / 2 y^f(i-d)(n-i + d)p 2 \ 



1 



2-d/a 



2 -d/2 



E 

j=0 v 
n 

E( d+1 ) J - 



(1-p) 2 

i(n — i)p 2 ^ J 



1 



a-?) 2 y + 

1 



j=0 



Now we proceed with the proof of the lower bound. 
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Proof of Theorem^ Assume that n > 91 as otherwise both bounds are negative and the claim is trivial. If 
(j, > p. := p^L'p)"-! then the probability that the first p, search points generated during initialization find 



1G 



the optimum is at most /i • 2 " <C 1/2, which establishes the lower bound /i/2 > ^{33^ an d proves both 
bounds. In the following we assume /i < p, and neglect the cost of initialization. 

Let I — \n — min{n/logn, l/(p 2 nlogn)}] . Consider the following partition Ag, . . . , A n . Define A; = 
{x | OneMax(x) = i] for i > £ and let Ag contain all remaining search points. With probability at least 

1 — jl ■ Yl7=o (l)^~ n — 1 — l/(l°g n ) f° r n > 91 the initial population only contains individuals on the first 
fitness level. 

For j > i let pij be the probability of the event that mutating an individual with i ones results in an 
offspring that contains j ones. If i > £ then 

i(n - i)p 2 < n{n - i)p 2 < — !— < (1 - p) 2 . (12) 

logn 

From Lemma [2] we know that then for every k S INo and every i' < i 

^ kf-i \n-k {n-i) k A 3 i(n~i)p 2 



k\ \ 5 (l-p) 2 

Without loss of generality, we can assume i' := i in the following, i.e., that the algorithm always selects a 
best individual from the population as parent. For i > I define 

3 i(n — i)p 2 \ , / p(n — i)^ ' ' 



< := p(l -p) n ■ (n - i) ■ \1 + - ■ v (1 j and 7 ^ +fc . ^ _^ 

where the prime indicates that these will not be the final variables used in the application of Theorem [3] 
Observe that 

, t f , sn-i / ; n A , 3 «(«-«)p 2 \ fp(n-i) 



5 (l-p) 2 / V l-p 



= p*(l-p)-*. („-*)*. fl + t-fc^ 



> 



i,i-\-k • 



Since Theorem [9] requires the 7^ -variables to sum up to 1, we consider the following normalized variables: 
Ui := u[ ■ Y^=i+i 7f i an( i 7* j := t^" 7 '' J — ~- ^ s u i7i,i = u 'i7i i — Piji the conditions on the transition 

J ' ' 2-^i j = i-\-\ r Y%%3 ' 

probabilities are fulfilled. The condition 7ij - > xY^k=jli,i 1S equivalent to 7^- > X X}/d=? 7ij ■ Also n °te 
that 

p(n-i) <p(n-£) < < 1 - p, (13) 

log n 

the second inequality following from p(n — £) < pn/logn < 1/logn if p < 1/n and p(n — £) < p/ (p 2 nlogn) = 
l/(pn\ogn) < 1/logn if p > 1/n. Noting that p ^~^ < 1, we get 

fc=j-i fc=i-< V 7 V ' k=Q V ^ 

= 7' - 1 



^ 7i,j ' J 



J 1 _ p(n-i) 
l-p 

1 



(l-p) logn 
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Hence, choosing \ := 1 — ^ 1 _J )logn we obtain 



i ii. 

1 ~ (l-p)logn X 



as required. Now that all conditions arc verified, wc proceed by estimating the variables it;. Bounding the 
sum of the 7- -values as before, 



j=i+l j=0 V y ' (l-p)logn 

Using 1 + x < 1/(1 - x) for a; < 1 and (JT2J| we get 



< p(l - p)"" 1 • (n - i) • 1 + - 



3 i(n — i)p 



5 (1-p) 2 J 1 



(1-p) log n 



^(l-py-i.^-i) 

< p(l -p)" -1 ■ (n-i) 
<p(l-f) n_1 •(»-*) 



-, _ 3 i(n-i)p 

1 5 ' (1-p)* ; ' 1/ (1-p) log n 



5 (1— p) 2 log n y \ (1— p) log n 



1 - - • 1 

5 (1— p) 2 log n 



Applying Theorem [3] and recalling that the algorithm is initialized on the first fitness level with probability 
at least 1 — yields the lower bound 



1 2zi 1 



log n J \ (1 — p) log n J \ (I — p) 2 log n J p(l — p) n 1 ^— ^ n — i 



! 18/5 \ 1-p 



n / 1 a 



(1 — p) 2 logn/ p(l — p) n ^ z 

Since Yl\=i 1/i > In r for any r € M + , the bound is at least 

18/5 \ 1-p , / . f n 1 
1 ; ctt": — ; r- ' hi 



(1 — p) 2 logn J p(l — p) n \ ( logn ' p 2 n log n 

= - (i -X t J s^fr ' (1 " (min 1/(p2 " )}) - " l(log " )) ' 

Note ln(logn) = ln((lnn)/ln2) = In Inn — In In 2 < In Inn + 0.37. For p < 1/n and n > 91 the lower bound 
simplifies to 

2.6 \ 1 , ., . , Inn - (In Inn - In In 2) - 2.6 

1 — — • (Inn -In (logn > V 



lnn/p(l — p) n " p(l — p) r 

In n — In In n — 3 
" p(l-p) n 
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For 1/n < p < l/(y / nlogn), using again n > 91, we get 



\ (1 — p) 2 log nj p(l — p) n 
ln(l/(p 2 n)) - In Inn + In In 2 - 




18/5 \ l-p 



■ (ln(l/(p 2 n)) - In Inn + In In 2) 



> 



18/5-ln2 
1-P 



p • ln(l/(p 2 n)) 



p(l - p)» 

ln(l/(p 2 n)) - In Inn + In In 2 - ■ 



/J 



l-l/txATlogK) 



18/5-ln2 



ln(log 2 n) 
\/nlogri 



> 



> 



p(l -p) 

ln(l/(p 2 n)) -In Inn -3 
p(l-p) n ' 



n 



□ 



The above lower bound holds for a very broad class of evolutionary algorithms. This indicates what 
performance can be achieved by EAs using the most common mutation operator, and what the optimal 
mutation rate is. It is interesting to note that the lower bound does not apply to all known search heuristics, 
though. Some search heuristics can perform better, including local mutation operators flipping only a single 
bit [2], quasirandom evolutionary algorithms [7J, biased mutation operators [3T], and genetic algorithms 
with a fitness-invariant shuffling operator [24] . 

7 A Lower Bound for all Functions with Unique Optimum 

Intuitively, OneMax is the easiest function with a unique global optimum. The function gives the best 
possible hints to reach the optimum. This can be regarded as the task of finding a single target point in the 
search space. A lower bound for the time until this target is found also applies to a much broader class of 
functions. 

We therefore consider the class of functions with a unique global optimum. This class contains all linear 
functions, all monotone functions (as defined in |S]), and all unimodal functions (when unimodality is defined 
as having a single local optimum). It is even much broader as it also contains many multimodal problems, 
needle functions, trap functions, and many more functions. 

We first consider the lower bound for mutation probability 1/n from Corollary [3] Using arguments by 
Doerr, Johannsen, and Winzen [10] . we show that this lower bound transfers to all functions with a unique 
global optimum. This yields a more precise result than the asymptotic bound f2(nlogn) from unbiased 
black-box complexity by Lehre and Witt [29] . 

In [10] the authors proved that the expected optimization time of the (1+1) EA with mutation probability 
1 /n on OneMax is not larger than the expected optimization time of the (1+1) EA on any other function with 
unique global optimum. Their proof extends to arbitrary mutation-based EAs with mutation probability 1/n 
in a straightforward way. 

Theorem 10. The expected number of function evaluations for every mutation-based EA A with mutation 
probability 1/n on every function f with n > 2 bits and a unique global optimum is at least en Inn— en In Inn- 
Sen. 

Proof. For some a £ {0, 1}" denote by f a the function f(x © a) where © denote the bit-wise exclusive or. 
Observe that this transformation does not change the behavior of a mutation-based EA in any way, i. e., all 
mutation-based EAs have the same runtime distribution on f a as on /. Hence, we do not lose generality if 
we transform the function / in such a way that 1™ is the global optimum. 

Let E, denote the expected optimization time of A on / and assume that the algorithm has already 
created search points x%, . . . ,xt- Let E*Ai) be the minimum expected remaining optimization time for A 
given that A has only created individuals on the first i fitness levels so far, formally x\, . . . , Xt € Aq U • • • UAi 
with Aq, . . . , A n the canonical partition for OneMax. 
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Observe that by definition, since the conditions on x%, . . . ,xt are subsequently restricted, 

E f A (n)<E f A (n-l)<---<E f A (0). 

Further define a more specific and slightly modified quantity for the (1+1) EA M : let E^°^*^ A (i) be defined 

like ^jR+i^BA (*)' bu t w ^h the additional condition that the history xi, . . . ,Xt contains at least one search 

point in A. L . Since we have only added a constraint, £°"^ea W ^ ^Ti^ea (*)• 

Following Doerr, Johannsen, and Winzen [TU], we now prove inductively that for all % it holds E A (i) > 

^(i+iYea W- Clearly £^(n) > £°+^ea ( n ) = °' Assume E aU) > ^TiTea C?) for a11 3 > L Let x ' be 
the next offspring constructed by A. If the best OneMax- value seen so far is at most i and OneMax(x') = 
k > i then the expected remaining optimization time is at best EAk) (or larger). If the new offspring has 
a smaller number of ones, the remaining expected optimization time is still bounded below by EAi). Thus, 
using the assumption of our induction, 

n 

E f A (i) > 1 + E P (OneMax(iE') = k) ■ E f A (k) + P (OneMax(x') < i) ■ E f A (i) 

fc=i+l 
n 

> 1 + E P (OneMax(x') = k) ■ Ef™^ A ( k ) + p (OneMax^') < t) • E f A (i). 

k=i+l 

The best distribution for OneMax(a;') is obtained when a parent z with exactly i ones is selected. A formal 
proof of this claim is given in [TUJ Lemma 11]. (Note that the probability of selecting such a z might be 0, 
in which cases the real bound is even larger.) Let Z be the random number of ones when mutating z, then 

n 

E !S) >1+ E P(Z = k). E^^ K (k) + P (Z < z) • E{(i). 

fc=i+l 

On one hand this is equivalent to 

f „ i + EL, + i P(z = k)- e^m,* {k) 

E f Ji) > ; ^ — - — s — . 14 

- 4W - l-P{Z<i) y ' 

On the other hand for the (1+1) EA M on OneMax we have 

n 

^(1+1)^(0 = 1+ E V(Z = Q-tf£^Jk) + P(Z<i).E$^Ji), 

k=i+l 

which is equivalent to 

pOncMax ,,\ _ (1 + 1) 

EA ^) - i_p(z<i) ■ ( 5) 

Taking (0 and ([IS]) together yields E f A {i) > E^$f** A (i). Moreover, E$$f% A (i) > Efiffi* A (i). As .A 
and (1+1) EA^ are initialized in the same way, they share the same distribution for the initial fitness level. 
We conclude E S A > ba and the bound follows from Corollary [3] applied to (1+1) EA^. □ 

Witt |51j recently generalized the above proof towards arbitrary mutation probabilities and stochastic 
dominance. The latter is a stronger statement than a comparison of expectations. If the running time of an 
algorithm A dominates that of B then this implies that the expected running time of A is higher than that 
of B. 

The generalization towards arbitrary mutation probabilities p < 1/2 is non-trivial. In contrast to the 
above proof, it is not always the case that choosing the parent with the largest number of 1-bits yields the 
best progress. For this reason, we just cite his result here. 
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Theorem 11 (Witt [ST]). Consider a mutation-based EA A with population size (i and mutation probability 
p < 1/2 on any function with a unique global optimum. Then the optimization time of A is stochastically at 
least as large as the optimization time of the (1+1) EA^ on OneMax. 

This immediately implies that the lower bound from [5] transfers to every function with a unique global 
optimum. 

Theorem 12. The expected optimization time of every mutation-based EA using mutation probability p on 
every function with a unique optimum is at least 

In n — In In n — 3 
P(l-P) 11 

if 2~"/ 3 < p < 1/n and at least 

ln(l/(p 2 n)) - In Inn - 3 
p(l-p) n 

ifl/n<p< 1/ ( y/n log n) . 

As a side result, we have also shown that the (1+1) EA M is an optimal algorithm for OneMax. For 
every fixed value of u the (1+1) EA^, is never worse than any other algorithm initialized with (l uniform 
random individuals. It is interesting to note that, as for LO, the (1+1) EA, i.e., the (1+1) EA^ with 
(i = l, is generally not the best mutation-based algorithm for OneMax. In fact, for a proper choice of (i and 
reasonable p the (1+1) EA^ has a strictly smaller expected optimization time. 

Compare, for instance, the (1+1) EA with the (1+1) EA^ for p = 1/n and (i = 0(logn). For both we 
consider the time until the algorithms find a search point with at least n/2 + y/n 1-bits. It is known that 
the probability that initialization creates a search point with at least n/2 + y/n 1-bits is at least a constant. 
Hence, with high probability the (1+1) EA^ will start with at least this value after initialization. (The 
running time in case this does not happen is negligible.) 

Contrarily, if the (1+1) EA starts with i < n/2 + y/n 1-bits then by simple drift arguments it needs 
at least time n/2 + y/n — i to reach a search point with fitness at least n/2 + y/n. The reason is that the 
expected progress is clearly bounded by the expected number of flipping bits, which is 1. It is not hard to 
see that the (1+1) EA then needs Q(y/n) generations in expectation to reach the threshold. 

As both algorithms behave equally after having reached the threshold (modulo possible small differences 
for overshooting the threshold), the (1+1) EA M is faster than the (1+1) EA by an additive term of Q(y/n) — 
6(logn) = e(Vn). 

Note that \x cannot be too large, either. It is known that, with high probability, the number of 1-bits in a 
random search point is at most n/2 + y/n\ogn. If (i = uj(y/nlogn) then the (1+1) EA gets to this threshold 
faster than the (1+1) EA^. 

Theorem 13. Among all mutation-based EAs the expected number of fitness evaluations on OneMax is 
minimized by the (1+1) EA^ with mutation probability p = 1/n and 1 < (1 = 0(y/nlogn). 

This result contrasts the result by Borisovsky and Eremeev [5] on the optimality of the (1+1) EA on 
OneMax. The authors do not consider the impact of initialization. Strictly speaking, their concept of 
dominance does not generally hold when comparing an algorithm with the (1+1) EA that is initialized in a 
different way. 

As word of caution, we remark that it is clearly not worth optimizing for (i in practice as the differences 
in the expected running time only concern additive terms of small order. 

8 An Exponential Lower Bound for Long /c-Paths 

Finally, we extend the proposed lower-bound method towards settings where many transition probabilities 
have to be considered. A common setting is that transition probabilities to the next few higher fitness levels 
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can be estimated quite easily. But if there are many fitness levels, dealing with those to fitness levels that 
are "far away" can become tedious. Also, in some settings condition ((2J on the transition probabilities 
may be violated when transition probabilities become very small. If this only happens when the transition 
probabilities are very small anyway, we still expect the lower bound from Theorem [3J to hold, apart from 
small error terms. 

This reasoning is made precise in the following theorem. For each fitness level we only consider the next 
d fitness levels, where d G IN can be chosen arbitrarily. The conditions involving transition probabilities 
only need to hold for these values. If d <C m this means that we only have to consider a tiny fraction of all 
transition probabilities. We also introduce a variable a as a lower bound for the probability that a transition 
is only made to these d levels. The resulting bound equals the one from Theorem [3] apart from a factor a m ~ % . 
This factor can be regarded as (an upper bound on) the probability that the algorithm on every fitness level 
makes jumps up to at most d fitness levels. 

Theorem 14. Consider an algorithm A and a partition of the search space into non-empty sets A\, . . . , A m . 
Choose d € IN and let the probability of A traversing from level i to level i<j<i + din one step be at most 
m ■ jij, where YJj=i+i 7i,j = l - 

Define a — a(d) such that a < 7i,i+j for all 1 < i < m — d — 1. Assume that for all i < j < i + d 

and some < X < 1 it holds 

m 

> X^2li,k- (16) 

k—j 

Then the expected hitting time of A m is at least 

m— 1 / tn— 1 .j \ 

V?U starts in AA ■ a m ^ • — + y V — (17) 
m—l m—1 1 

> p ( A starts in A i) ■ " m ~'x —■ (is) 

As the proof is very similar to the proof of Theorem[3j it is omitted. Alternatively, the statement can be 
proven by conditioning on the event that in each improvement of the current best fitness level the algorithm 
advances by at most d levels, and applying the law of total expectation. 

A prime example for a setting where the new method is applicable is the class of long fc-paths. These 
functions were introduced by Horn, Goldberg, and Deb QI5] , formally defined by Rudolph [39] , and analyzed 
by Droste, Jansen, and Wegener [15] . We stick to a slightly cleaner formulation from [40 . A long fc-path is a 
sequence of search points called path. Two neighbored points on the path differ in exactly one bit. Assigning 
increasing fitness values to the points on the path enables an EA to climb up the path. All search points 
outside the path have worse fitness and they give hints to reach the start of the path. 

The parameter k indicates the distance between different parts of the path. For all points x on the path, 
the «-th successor has Hamming distance i to x, for 1 < i < k. All further successors of x have Hamming 
distance at least k to x. This means that in order to take a shortcut on the path, an EA must flip at least k 
bits at the same time. If k is not too small, an EA typically climbs to the end of the path in small steps. 
For k = y/n the probability of taking a shortcut is exponentially small, and the length of the path is still 
exponential. More precisely, the length of a long fc-path on n bits is k-2 k / n -k [HI SO]. 

Long /c-paths are a prime example for this extension because they give rise to a potentially exponential 
number of fitness values. For every point on the path, the Hamming distances to the next k successors on 
the path are well known. But for all further search points we only know that they have Hamming distance 
at least k. Putting d := k, it is easy to apply the modified lower-bound method. 

For simplicity, we assume that the (1+1) EA is initialized with the first point on the long fc-path. This 
not an essential restriction. It is very unlikely that the long fc-path is reached beforehand as the "density" of 
points on the long fc-path is extremely low, for reasonable values of k. By definition, each Hamming ball of 
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radius fc/2 contains roughly n k / 2 / (fc/2)! search points, but at most k of these can be part of the long fc-path. 
This means that it is extremely unlikely to find a point on the path by chance (except for the first k points), 
while being guided towards the start of the path. 

Theorem 15. Consider the (1+1) EA with mutation probability p starting at the first point of the long 
k-path. Let m + 1 = k ■ 2 n / fe — k be the number of search points on the long k-path, then the expected 
optimization time of the (1+1) EA is at least 

l-2p\ p 



1 - 



p(l — p) n 1 \ 1 — p J \ \1 — p 

In order to make sense of this lower bound, note that the term ^(1,-™)..-! reflects the expected time to 
make m specific 1-bit flips. This would be the exact expected optimization time if the (1+1) EA would 
never accept a mutation that flips more than one bit. It also represents an upper bound on the expected 

optimization time of the (1+1) EA by a straightforward application of Theorem O The term ^ J i s 

necessary to account for successful mutations that flip more than one bit. The last term ^1 — ^x^p) 

roughly equals the probability that no improving mutation makes a progress by more than k on the path on 
all fitness levels. 

For the common choice k = ^/n we get the following. The bound from Theorem [TS] is simplified by 
applying the inequality (1 — x) m > e ~ 2xm for < x < 1/2 and m > 1 to x := (p/ (1 — p))^™- 

Corollary 4. Consider the (1+1) EA with mutation probability < p < 1/3 starting at the first point of 
the long k-path with k = \fn. Then the expected optimization time of the (1+1) EA is at least 

J>(l-p)"-' \l-pj \ 



For every < p — o(l) the expectation is 

y/n2y^__k 
p(l -p) 11 - 1 



(l-o(l)), 



i. e., upper and lower bounds are tight up to lower-order terms. Furthermore, the choice p = 1/n for the 
mutation probability minimizes the expected number of function evaluations of the (1+1) EA in this setting 
if n is large enough. 

The dominant term for p = 1/n is en 3 / 2 2 v/ ". The leading constant is by a factor of 2e larger than the 
leading constant in the previous best known lower bound 1/2 • n 3 ^ 2 2 v/ ™. The latter can be derived from 
enhancing the proof of Theorem 23 in |15) with modern drift analysis techniques, and assuming that the 
(1+1) EA starts on the first point of the path. 

Proof of Theorem \15i Consider the canonical fitness- level partition A®, . . . , A m , i. e., Aa contains the first 
point 0™ on the path and A m contains the last point on the path. The transition probabilities are cut off after 
a jump length of d := k, where k is the parameter of the long fc-path. For all < i < m and 1 < j ' < m — i 
define 



«i = (l-p) B 

and 



1-p 



(+-,)' 

V*m— i ( P \ 
l^e=i \i-p) 
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Intuitively, by defining these values we pretend that the j-th successor on the path has Hamming distance j, 
for all j — not just for 1 < j < k. It is obvious from the definition that Y^JLi+i = 1 f° r au — * — m - 
For i < j < i + d we have 

u ilid = (i-pT ■ (y^)' =^'(i -P) n ~ j , 

which is precisely the probability of mutation reaching the j-th successor of the current search point on the 
path. 
Define 

1 

X = 



Em— ± / p 
j=o yi-p 

then for i < j < i + d condition [16] resolves to 



(l-p) > 1 _ (l-p) 



Em—i ( _p_\ a TT^m—l ( _p_\ a ir^m—i f p \° 

a=l \l-p) 2~,a=0 \l-pj l =i 2-<a=\ I l-p J 

and this is equivalent to 

1 m ~ i ( p \ e 

1 - ^m-i 7 p \ a ' ^ Ir^J ' 

which is true since j > 1. Now for all < i < ?n — — 1 we need to define a as a lower bound for 

SET (i^) 

The worst case is obtained for i = where we get 

, / \j / \ d+1 / \ d+1 / \ m+1 

V I P I -^E I 2 

^i=iV 1 -py __ i-p V 1_ p 







- 









Em / p \ P _ / P A P _ / P A 

i-p V 1_ p/ 1_ p v 1_ p/ 

Applying Theorem [TJ] yields the lower bound 



= | _— ^ v ' ' ' , > .1 - I — — 1 := n. 

I / N m + 1 — \ I _p ' 



1 Lip 



> 



p(i — p) n 1 \ i — p j \ \ i — p 



□ 



9 Conclusions 

We have presented a new method for proving lower bounds on the expected optimization time of random- 
ized search heuristics. The method is based on an adaptation of the fitness-level method, with additional 
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conditions on transition probabilities. It is intuitive, elegant, versatile, and easy to apply as one can freely 
choose values for \, m, and jij (1 < i < j < m) subject to the required conditions. As a side result, it has 
also led to a refinement of the well-known upper-bound method with fitness levels. 

The lower-bound method has been accompanied by several applications to a broad range of evolutionary 
algorithms. To this end, we have introduced the class of mutation-based evolutionary algorithms. It captures 
all EAs that only use mutation, regardless of parent selection or population models. We have derived very 
precise lower bounds for LO, OneMax, and all functions with a unique global optimum. These bounds apply 
to all mutation-based EAs. Such a generality was previously only known for black-box complexity results. 
A further application for the (1+1) EA on long fc-paths has shown that the method still yields tight lower 
bounds, even when considering only a tiny fraction of all transition probabilities. 

All bounds are parametrized with the mutation probability p. The lower bounds for LO, OneMax, and 
long fc-paths are tight, compared with upper bounds for the (1+1) EA, up to smaller-order terms, for all 
reasonable mutation probabilities. This is a rare occasion of results that are both very general and very 
precise at the same time. 

The results have also allowed to formally identify optimal mutation-based EAs for LO and OneMax, i. e., 
which algorithm minimizes the expected number of fitness evaluations. In both cases this is a variant of the 
(1+1) EA that creates more than one search point uniformly at random during initialization. Furthermore, 
we have seen that p w 1.59/n is an optimal fixed mutation rate for LO (see [3]), p — l/n is optimal for 
OneMax (see [ITI]) and p = l/n is optimal for the (1+1) EA on long fc-paths. These very strong conclusions 
further demonstrate the strength of the new lower-bound method. 

Summarizing, we have made an important step forward towards understanding how EAs work, how to 
find optimal parameter settings, and which EAs are optimal for certain problems. Note that the method 
itself is not restricted to mutation-based EAs in binary spaces. It is ready to be applied to other search 
spaces and further stochastic search algorithms; either in its pure form or as a part of a more general analysis. 
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